Dense and Sparse Matrix Operations on the Cell Processor
نویسندگان
چکیده
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI’s forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycleaccurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, using a variety of algorithmic approaches. Results demonstrate Cell’s potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.
منابع مشابه
Cache Oblivious Dense and Sparse Matrix Multiplication Based on Peano Curves
Cache oblivious algorithms are designed to benefit from any existing cache hierarchy—regardless of cache size or architecture. In matrix computations, cache oblivious approaches are usually obtained from block-recursive approaches. In this article, we extend an existing cache oblivious approach for matrix operations, which is based on Peano space-filling curves, for multiplication of sparse and...
متن کاملMatrix Bidiagonalization on the Trident Processor
This paper discusses the implementation and evaluation of the reduction of a dense matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in matrix-vector operations, and the LAPACK subroutine _GEBRD, which is rich in a mixture of vector, matrix-vector, and matrix operations, are simulated on the Trident processor....
متن کاملAlgorithmic patterns for H-matrices on many-core processors
In this work, we consider the reformulation of hierarchical (H) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs). H matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of H matrix oper...
متن کاملChain Multiplication of Dense Matrices: Proposing a Shared Memory based Parallel Algorithm
Chain multiplication of matrices is widely used for scientific computing. It becomes more challenging when there is large number of floating point dense matrices. Because, floating point operations take more time than integer operations. It would be interesting to lower the time of such chain operations. Now-a-days every multicore processor system has built in parallel computational power. This...
متن کاملDense matrix operations on a torus and a boolean cube
Algorithms for matrix multiplication and for Gauss-Jordan and Gaussian elimination on dense matrices on a torus and a boolean cube are presented and analyzed with respect to communication and arithmetic complexity. The number of elements of the matrices is assumed to be larger than the number of nodes in the processing system. The algorithms for matrix multiplication, triangulation, and forward...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007